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Abstract. In this paper we investigate the geometry of a discrete Bayesian 
network whose graph is a tree all of whose variables are binary and the only 
observed variables are those labeling its leaves. We obtain a full geometric 
description of these models which is given by polynomial equations and in- 
equalities. Our analysis is based on combinatorial results generalizing the 
notion of cumulants so that they apply to the models under analysis. The geo- 
metric structure we obtain links to the notion of a tree metric considered in 
phylogenetic analysis and to some interesting determinantal formulas involving 
the hyperdeterminant of 2 X 2 X 2 tables. 



1. Introduction 

A Bayesian network whose graph is a tree aU of whose inner nodes represent vari- 
ables which are not directly observed lie in an important class of models, containing 
phylogenetic tree models and hidden Markov models. Inference for this model class 
tends to be challenging and often needs to employ fragile numerical algorithms. 
In [27] we established a useful new coordinate system for such models when all 
of the variables are binary. This analysis enabled us not only to address various 
identifiability issues but also helped us to derive exact formulas for the maximum 
likelihood estimators given that the sample proportions were consistent with edge 
probabilities assigned to this model class. 

However, the application of this new coordinate system reaches far beyond un- 
derstanding the identifiability and it can be used to analyze the global structure 
of these tree models. For example [ ] gave an intriguing correspondence between, 
on the one hand, a correlation system on tree models and on the other distances 
induced by trees where the length between two nodes in a tree is given as a sum of 
the length of edges in the path joining them. Our new coordinate system for the 
tree models enables us to explore this relationship between probabilistic tree models 
and tree metrics in detail. It was already implicit that constraints on possible dis- 
tances between any two leaves in the tree imply some inequality constraints on the 
possible covariances between the binary variables represented by the leaves. These 
inequalities follow from the four-point condition ([ " ], Definition 7.1.5) together 
with some other simple non-negativity constraints (c.f. Equation (27)). However in 
this paper we also show that these inequality constraints cannot be sufficient and 
there are some additional constraints involving higher order tree cumulants. We 
provide the full set of the defining constraints in Theorem 4.6. This is given by a list 
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of polynomial equations and inequalities which describe the set of all probability 
distributions in the model. 

Our approach here is founded in a geometric study of tree models through the 
method of phylogenetic invariants first introduced by Lake [ ] , and Cavender and 
Felsenstein [d]. These invariant algebraic relationships are expressed as a set of 
polynomial equations over the observed probability tables which must hold for a 
given phylogenetic model to be valid. We note that these algebraic techniques have 
also been embraced by computational algebraic geometers [1][11][26] enhancing the 
statistical and computational analyses of such models [ ]. A similar problems can 
be solved for other model classes [^] . The main technical deficiency of using phylo- 
genetic invariants in this way is that they do not give a full geometric description 
of the statistical model. The additional inequalities obtained as the main result of 
this paper complete this description. Where and how these inequality constraints 
can helpfully supplement an analysis based on phylogenetic invariants is illustrated 
by the simple example given below. 

Example 1.1. Let T be the tripod tree below where we use the convention that 
observed nodes are depicted by shaded nodes 
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The inner node represents a binary hidden variable H and the leaves represent 
binary observable variables Xi, X^. The tree represents the conditional in- 
dependence statements XiALX2-lLX3\H. The model has full dimension over the 
observed margin (Xi, X2, X3) and consequently there are no equations defining it. 
However it is not a saturated model since not all the marginal probability distribu- 
tions over the observed vector {Xi,X2, X^) lie in the model. For example Lazarsfeld 
[] (), Section 3.1] showed that the second moments of the observed distribution must 
satisfy 

Cov(Xi,X2)Cov(Xi,X3)Cov(X2,X3) > 0. 

This constraint, which clearly impacts the inferences we might want to make, is not 
acknowledged through the study of phylogenetic invariants. Therefore inference 
based solely on these invariants is incomplete and in particular naive estimates 
derived through these methods can be infeasible within the model class in a sense 
illustrated later in this paper. 

This example motivated the closer investigation of the semi-algebraic features 
associated with the geometry of binary tree models with hidden inner nodes. The 
main problem with the geometric analysis of these models is that in general it is hard 
to obtain the inequality constraints defining a model even for very simple examples 
(see [9, Section 4.3] [12, Section 7]). Despite this, some results can be found in the 
literature. Thus in the case of a binary naive Bayes model a somewhat complicated 
solution was given by Auvray et al. [ ]. In the binary case there are also some 
partial results for general tree structures given by Pearl and Tarsi [ i ] and Steel 
and Faller [-■'<]■ The most important applications in biology involve variables that 
can take four values. Recently Matsen [17] gave a set of inequalities in this case for 
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group-based phylogenctic models (additional symmetries are assumed) using the 
Fourier transformation of the raw probabilities. Here we provide a simpler and 
more statistically transparent way to express the constrained space. 

In Section 2 of this paper we briefly introduce conditional independence models 
on trees. We then proceed to describe a convenient change of coordinates for 
the models under consideration following [ : ]. In the new coordinate system the 
parametrization of the model has an elegant product form and in Section 3 we show 
how to use this to obtain the full semi-algebraic description of a simple naive Bayes 
model. In Section 4 we state the main result of the paper given by Theorem 4.6 
and give some necessary constraints on the probability distributions in the model 
class using a correspondence with tree metrics. In Section 5 we discuss these results 
for a simple quartet tree model. Finally, in Section 6 we use the parametrization 
developed earlier to find an alternative form of equations given by AUman and 
Rhodes [ ]. Our alternative specification is simpler from the algebraic point of 
view and has a more transparent statistical interpretation. We prove our main 
theorem in Section B. The paper is concluded with a short discussion. 



2. Tree models and tree cumulants 

In this paper we always assume that random variables are binary taking values 
either or 1. We consider models with hidden variables, i.e. variables whose values 
are never directly observed. The vector Y has as its components all variables in 
the graphical model, both those that are observed and those that are hidden. The 
subvector of Y of observed variables is denoted by X and the subvector of hidden 
variables by H. 

A (directed) tree T = (V, E), where V is the set of vertices and E <ZV xV in the 
set of edges of T, is a connected {directed) graph with no cycles. A rooted tree is a 
directed tree that has one distinguished vertex called the root, denoted by the letter 
r, and all the edges are directed away from r. A rooted tree is usually denoted by 

. By pa(u) we denote the node preceding v in . In particular pa(r) = 0. A 
vertex of T of degree one is called a leaf. A vertex of T that is not a leaf is called 
an inner node. 

A Markov process on a rooted tree T"^ is a sequence {Yy : v E V} of random 
variables such that for each (ai, . . . , G {0, l}'^' 

where 9 = (01^]^ ) and 9^^]^ = V{Y, = «,|Fp,(„) = ap,(,)). Since 9^^^ + 
9^1^ = 1 and 9^^j = 1 foi' ^ ^\{^} ^-nd i = 0, 1 then the set of parameters 

consists of exactly 2\E\ + 1 free parameters: we have two parameters: 6'[j'p, 6*1^^ for 
each edge (m, v) E E and one parameter 9^^^ for the root. We denote the parameter 
space by 9t — [0, IJ^I^I+i and the model given by this parameterization by Mr- 

Let n be the number of leaves of T and let A2n-i — {p <E M?" : X^^P/s = 1jP/3 ^ 
0} with indices (3 ranging over {0, 1}" be the probability simplex of all possible 
distributions of AT = (ATi, . . . , Ar„) represented by the leaves of T. Equation (1) 
induces a polynomial map : — >■ l^2^-i obtained by marginalization over all 
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the inner nodes of T 



(2) 




where H are all possible states of the vector of hidden variables, i.e. the sum is over 
av\[n] G {0, and for any A C V, aA = (aj)igA- We denote Mr = /t(6t) 
calling it the general Markov model (c.f. [ , Section 8.3]). A semi-algebraic set in 
M'' is any space given by a finite number of polynomial equations and inequalities. 
Since Qt is a semi-algebraic set and /t is a polynomial map then by the Tarski- 
Seidenberg theorem [ Section 2.5.2] Mt is a semi-algebraic set as well. 

In [ ] we described a convenient change of coordinates for directed tree models 
as a function of the usual parametrization (2) which is expressed in terms of the 
probabilities. The idea was to define a regular one-to-one polynomial map fp^ from 
A2"_i to the space of new parameters called tree cumulants )Ct- We defined a 
partially ordered set (poset) of all the partitions of the set of leaves induced by 
removing inner edges of the given tree T. Then tree cumulants are given as a 
one-to-one function of probabilities induced by a Mobius function on the poset. 
The details of this change of coordinates are given Appendix A and are illustrated 
below. 

The tree cumulants are given by 2" — 1 coordinates: p.i for all i E [n] and a set 
of real-valued parameters {kj : / C [n] where |/| > 2}. The first n coordinates 
are linear functions of the means of the n observed variables since fii — 1 — 2EXi . 
Each formula for kj is expressed as a function of the higher order central moments 
of the observed variables. These formulas are given explicitly in (31) of Appendix 
A. By Ai!^ we will denote the image of the original model Airp in the space of tree 
cumulants. We note that since fp^ is a one-to-one polynomial map then by the 
Tarski-Seidenberg theorem (see [■>, Section 2.5.2]) Ai'^ is a semi-algebraic set. In 
this paper we provide the full semi-algebraic description of A4t^, i.e. the complete 
set of polynomial equations and inequalities involving the tree cumulants which 
describes as the subset of ICt- 

Example 2.1. Consider the quartet tree model, i.e. the general Markov model 
given by the following graph (c.f. Section 6 in [ ]). 



The tree cumulants are given by 15 coordinates: fii = 1 — 2¥{Xi = 1) for i = 
1,2,3,4 and kj for / C [4] such that ]/] > 2. Denoting Ui = - EXi we have 
Kij = EU^Uj = Cov(X,, Xj) for 1 < i < j < 4 and 



for all 1 < i < < fc < 4 which we note is a third order central moment. However 
tree cumulants of higher order cannot be equated to corresponding central moments 
but only expressed as functions of them. These functions are obtained by perform- 
ing an appropriate Mobius inversion (see [27]). Thus for example from Appendix 
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A we have that 

= E (C/1C/2C/3C/4) - E iUiU2) E (C/3C/4) . 

But note that since the observed higher order central moments can be expressed 
as functions of probabihties, tree cumulants being functions of such moments can 
also be expressed as functions of these probabilities. 

Now let ilx denote the set of parameters with coordinates given by fly for v d V 
and rju y for (u, v) G E. Define a reparametrization map fg^ : Qt as follows: 

for every directed edge {u,v) e E 

(3) ry„_„ = 0['ll - 6l[j'^ and 

/2„ — 1 — 2Xy for each v E V, 

where A^, = KYy is a polynomial in the original parameters 6 of degree depending 
on the distance of v from the root r. Indeed, let r,vi, . . . ,Vk,v be a directed path 
in T. Then 

(4) Xy=nYy = i)= E <iX:iL.---e^- 

Qe{04}'=+i 

It can be easily checked that ,„ = Cov(y„, yt,)/Var(y„) and hence 

(1 - P-l)Vu,v = (1 - P'l)Vv,u- 

It also follows that rj^^y is just the linear regression coefficient of Yy with respect 
to y„, namely E(y„ — Ey„|y„) = ?7„_^(F„ — EF^), which gives a clear statistical 
interpretation for the new parameters. The parameter space VIt is given by the 
following constraints: fLy € [—1, 1] for all v € V and 

(5) - min {(1 + pi„)(l + fly), (1 - - < (1 - J^VlVu^v < 

< min{(l + flu){l - p-v), (1 - + P-v)} ■ 

In [27] we proved that there is is a one-to-one polynomial map between the two 
spaces giving the following diagram. 

(6) St ^A2-._i 

fix — — — — — ICt 

One motivation behind the change of coordinates and parameters is that the induced 
parametrization ipT ■ K^t has a particularly elegant form in terms of the new 

parameters. 

Proposition 2.2 ([27], Proposition 3). Let T = {V,E) he a rooted trivalent tree 
with n leaves. Then is parametrized by the map ipT ■ — ^ given as an 
identity on the first n coordinates corresponding to fli for i G [n] and on the other 
coordinates it is given by 

(7) A./ = i(l-/2^(,)) n n ^-^^ forlC=[n],\I\>2 

vGN(I) (u,v)eE{I) 

where the degree is taken in T{I) = (V{I), E{I)); N{I) denotes the set of inner 
nodes ofT(I) and r{I) denotes the root ofT[I). 
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Example 2.3. Consider the model in Example 2.1. The model is parametrized 
as in (2) by the root distribution and transition matrices attached to each of the 
edges. Say o'^^ = 0.8, 6l[|^ = 0.8, 6'[[j = 0.3, of^ = 0.7, of^^ = 0.3, o'^^l = 0.8, 

e'f^^ = 0.3, ef^ = 0.7, ef^ = 0.3, e^^^ = 0.7, 6'|j'j = 0.3. using (2) we provide the 
corresponding probabilities over the observed nodes in the third column in the table 
below. The change of coordinates presented in Appendix A gives the corresponding 
non-central moments A/ = E (Ilie/ -^i) ^^'^ tree cumulants k/ supplemented with 
the means Ai, A2, A3, A4. 



a 


/ 




Pa 




A/ 




0000 





0, 


.0444 


1, 


,0000 


1.0000 


0001 


4 


0, 


.0307 


0, 


.5800 





0010 


3 


0, 


.0307 


0, 


.5800 





0011 


34 


0, 


.0403 


0, 


.3700 


0.0336 


0100 


2 


0, 


.0346 


0. 


.6200 





0101 


24 


0, 


.0323 


0, 


.3724 


0.0128 


0110 


23 


0, 


.0323 


0, 


,3724 


0.0128 


0111 


234 


0, 


.0547 


0, 


,2422 


-0.0020 


1000 


1 


0, 


.0482 


0, 


,7000 





1001 


14 


0, 


.0491 


0, 


,4220 


0.0160 


1010 


13 


0, 


.0491 


0, 


,4220 


0.0160 


1011 


134 


0, 


.0875 


0, 


,2750 


-0.0026 


1100 


12 


0, 


.0828 


0, 


,4660 


0.0320 


1101 


124 


0, 


.0979 


0, 


,2853 


-0.0038 


1110 


123 


0. 


.0979 


0. 


,2853 


-0.0038 


1111 


1234 


0, 


.1875 


0. 


,1875 


0.0006 



The formula (3) gives: r/^.i ~ 0.5, 77^,2 = 0.4, rjr.a — 0.5, rja.s ~ 0.4, 770,4 — 0.4 and 
p.1 — —0.4, fl2 = —0.24, p,3 — —0.16, fl4 — —0.16, fir — —0.6, fta — —0.4. It is easy 
to verify (7). For example 

K1234 = ~ P-l)P'rP-ar]r,ir]r,2'nr,a11a,3lla,4 = 0.0006. 

Proposition 2.2 has been formulated for trivalent trees. However it can be easily 
extended to a more general case. For a given tree a contraction of an edge (m, v) 
results in another tree obtained from the original tree by identifying the nodes u 
and V and removing the edge {u, v). Let T be a tree and let T be any trivalent tree 
such that T is obtained from T by edge contractions. Then M'^ C C /C^ and 
by Corollary 4 in [27] the parameterization in (7) remains valid for T but expressed 
in the coordinates of ICip. 

Example 2.4. Let T be a star tree with four leaves, i.e. a tree with one inner node 
r and four leaves connected to r by edges (r, i) for i = 1,2,3,4. This tree can be 
obtained from the quartet tree in Example 2.1, denote it by T, by contracting (r, a). 
The model of the star tree can be realized as a subset of /Cjt, i.e. the space of tree 
cumulants for the quartet tree. The coordinates of JCj^ are obtained in Example 2.1 
and the parametrization of is given for example by 

'«1234 = t(1 - P'l)P-lVr,lVr,2Vr,3Vr,4- 
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Note however that this star tree may be obtained from many different trivalent trees 
by edge contraction. It follows that there exist many ways to embed the model and 
retain the parametrization. 

Remark 2.5. Let be a rooted tree and T its undirected version. Then Ai^ 
depends only on T. Indeed, without loss fix two different roofings r and r' . Let T 
be a tree rooted in r and by T' denote its copy rooted in r'. Then A4I^ — M.'^, and 
the parameters {'qu,v)i (P-v) and {rj'^ „), {p,'^) are related as follows. We have /z„ = p,'^ 
for ah V € V. Moreover, if (u, v) € E Ci E' then 77„_„ = t]'^^^ and if (m, v) € E \ E' 
then (1 — pl^)riu^v = (1 — PDv'v u- Note however that if for example riu,v ~ and 
pv = 1 then 7y(,_^ is not well defined and in this case we set 7y(, ,j = 0. From the 
form of inequalities in (5) constraints on /i.y and rju^v are satisfied if and only if the 
constraints on /i^ and rj'^ „ are satisfied. 

3. The semi-algebraic description of the tripod tree model 

In this section we obtain the full semi-algebraic description of the tripod tree 
model. This result is not new (see [2] [22]). However it is convenient to give a new 
proof of this result both to unify notation and to introduce the strategy which is 
used to attack the general case later. We begin with a definition. 

Definition 3.1. Let Abea2x2x2 table. The hyperdeterminant of A as defined 
by Gelfand, Kapranov, Zelevinsky [ , Chapter 14] is given by 

Det A — (aoooalii + Oooi'^iio + ^oio'^ioi + ^oii'^ioo) 



If ^ ttijk — 1 then treating all entries formally as joint cell probabilities (without 
positivity constraints) we can simplify this formula using the change of coordinates 
to central moments. The reparameterizations in Appendix A are well defined for 
this extended space of probabilities and we have that 



which can be verified by direct computations. We note in passing that a similar 
idea of treating moments formally lies behind the umbral calculus [20]. 

From the construction of tree cumulants (c.f. Appendix A) it follows that 
Kj ~ 1.1 J for all / C [n] such that |/| < 3. Henceforth, for clarity, these lower 
order tree cumulants will be written as their more familiar corresponding central 
moments. 

Lemma 3.2 (The semi-algebraic description of the tripod model). Let Mt be 

the general Markov model on a tripod tree T rooted in any node of T . Let P he a 
2x2x2 probability table for three binary random variables (A"i, X2, X^) with central 
moments /^i2, /^is, /^123 (equivalent to the corresponding tree cumulants) and 
fii = 1 — 2EXi for i = 1, 2, 3. Then is given by 



— 2(aoooaooiaiioaiii + aoooaoioaioioiii + aoooaoiiaiooiiii 
+ aooiooioaioiaiio + aooiaoiiiiioaioo + aoioaoiioioiaioo) 
+ 4(aoooaoiiaioiaiio + aooiaoioaiooOiii)- 



(8) 



Det A = ^123 + 4^12Ml3M23, 



(9) 



fi^j = 5(1 - ^J.l)Vh,iVh,j for all i^je {1, 2, 3} and 

/^123 = i(l - Pl)PhVh,lVh,2Vh,3, 
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where fth G [^Ijl] ^'^'^ (1 ~ P-h)Vh,i for all i = 1,2,3 satisfy the inequality (5). 
Moreover, P G Mt if ^^c^ only if K — fp^iP) G ICt — C3 satisfies the following 
inequalities 

(10) M12M13M23 > 0, 



(11) /^i2Mi3 + M12M23 + M13M23 < DetP < min pi, 

1<2<_?<3 

and 

(12) DetP < min |((1 + p.i)^J,jk - /ii23)^ , ((1 - P'i)fJ'jk + Mi23)^| , 

for all i = 1,2,3 where by j, k we denote elements 0/ {1, 2, 3} \ i. 

Proof. By Remark 2.5 A4t^ does not depend on the rooting and hence we can 
assume that T is rooted in h. The parameterization in (9) fohows from Proposition 
2.2 by considering T rooted at h and the corresponding independence statements 
XiJL{X2,X3)\H and X2-ILX3\H. 

Denote by M the subset of /Ct given by inequahties in (10), (11) and (12). We 
need to show that Ai = Ai^ for any rooting of T. First we prove that C A^. 
Let K = Tpri^) for some w G ilr with coordinates given by fth and fii, r]h,i for 
i = 1,2, 3. Using (9) we obtain 

(13) M12M13M23 = Q(l - {VhAVh,2Vh,3f ■ 

Since fih G [—1, 1] this imphes the inequahty in (10). Moreover, we have 

(14) DetP = /ii23 + 4Mi2Aii3A*23 = :^(1 - fJ-hfiVh.iVhaVh.a)'^- 

Id 

Muhiplying both sides by fif^ together with the second equation in (9) implies 

(15) /2^DetP = M?23' (l-Ah)DetP = 4/^i2Mi3M23- 
On the other hand (9) and (14) imply also that 

(16) T]l., n% = Det P for aU i = l,2, 3. 

Again by substituting fiij for \{1 — P'ji)Vh.iVii.j it can be shown that 

(17) M?2/"l3 + M?2Ai23 + Ai?3M23 J^^^ ~ f^hfivl,! + ^1,2 + vl,3)'DetP. 

Since necessarily rjl ^,flf^ G [0, 1] then (15), (16) and (17) imply that 

1^21^13 + Ai?2/"23 + Ai?3Ai23 < Det P < min 

To show that K satisfies (12) first divide this inequality by /i^^, (if it zero the 
inequality is trivially satisfied since DetP — ffl23)- Using (16) and the fact that 
^ff^ = P'h'nh,i we obtain 

Vh,i < niin{((l + /2,) - p,hVh,i)^, ((1 - f^t) + lihVh,tf}- 
This is equivalent to 

? 

V 2 



[iihA^-nD + M^ + ni)) <(i + M.)', 

(??/«,i(l - mD - t^h{l - < (1 - M*)^- 
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Since l±/2i > this in turn reduces to (5). Therefore since by hypothesis (5) holds 
and hence M. 

Now we show that C M.'^ by proving that ioi K € M. a, parameter uj in (9) 
exists which satisfies constraints defining Q,t and K = iI;t{oj). Let P = f~^{K) 
then from (10) we know that DctP > 0. So consider separately the two situations: 
first when DetP — and second when DetP > 0. In the first case again from 
(10) necessarily /ii23 = 0. The inequality (11) therefore implies that at least two 
covariances are zero. If all the covariances are zero then for yy^ i = rih^2 — Vh,3 
and /i^ = 1 we obtain a valid choice of parameters in (9) and the values satisfy (5). 
When one covariance, say fii2 ^ 0, is non-zero then if a choice of parameters exists 
it has to satisfy ftf^ 7^ 1, ?7h.ij'7?i.2 7^ and 77/1.3 = 0. Such a choice of parameters 
will exist if we can ensure that /ii2 = (1 — fi^)'r]h,i'i]h.2- This follows from Corollary 
2 in [ i] which states that if only /ii2 7^ then there always exists a choice of 
parameters for model XiALX2\H, where H is hidden. 

Assume now that DetP > which by (11) implies that /iy 7^ for each i < 

j = 1,2,3. Set fil = and r]l ^ = for i = 1,2,3. It follows that 

(3(1 - = t4j for i,j = 1,2,3 and (i(l - P-D? P-Wh, ivhvia = A^Ls' 

which coincides with (9) modulo the sign. It can be easily shown that ^12/^13/^23 > 
implies that there exist a choice of signs for rjfi.i for i = 1, 2, 3 such that 



for all 1 < z < j < 3 as in (9). For example set sgn^rjfi.i) = sgndijk) and use 
the fact that by our assumption sgn{pij) = sgn(/Xifc)sgn(/ijfe). This choice of signs 
already determines the sign of p,h so that 



It remains to show that parameters set in this way satisfy the constraints defin- 
ing fix- First note that since < 4^12/^13/^23 < DetP then S [0, 1] as required. 
From Appendix D in [27] we know that if (77/1.1, 7//i,2j '7;i,3j P-h) is one choice of param- 
eters then there exists only one alternative choice and it is (—77/1,1, —77/1,2, — ?7h,3, —ph)- 
For each i = 1, 2, 3 we check if (1 — p1)rih,i satisfies (5). For a fixed i = 1, 2, 3 one 
easily checks that {r]h,i,ph) satisfies (5) if and only if (— 7//i,i, — /2/i) does. Therefore 
one can assume that rih,i = ^j^"*^^ > 0. In this case ph — ^(Jt^)^/^^ where 
s{j, k) = sgn(pjk). From this it follows that (5) is satisfied if and only if 

(18) 4/Xl2/tl3/*23 < (1 ± pi){^J fl'^j^.Bet P T tJ-jktJ'123)- 

We show that (10) and (12) already imply that this inequality has to be satisfied. 
Multiply both sides by ^ /i|^Det P ± /ij/£/ii23 and then divide by 4/ii2/fi3/i23 (both 
expressions are strictly positive) to obtain 



^(1 - A)^h,tVK] /iy 



7(1 - fJ-h)fJ'hVhSVh.,2Vh,3 = /il23 



holds. 



(19) 
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Since DetP > and /i^j, > this is equivalent to 



< (1 ± T MjfcMi 



23, 



(20) 

DetP < ((1 ± fii)fijk T M123) 

The second inequahty in (20) is exactly (12). So we have to show that the first 
inequality is already implied by (10) and (12). To see this rewrite (12) as 

4Ail2Ail3M23 < (1 ± Mi)^Aijfc T 2/Xjfc/ii23(l ± fli). 

However, since /ii2/^i3M23 ^ the right hand side of the inequality above has to be 
non-negative as well. In particular 



(21) 



(1 - Ai*)Ai?fe > -2M123Mj/c 



(1 + Aij)Ai,fe > 2pii23Mjfc- 



noting that the left-hand sides are nonnegative. For each of the two inequalities 
if the right-hand side is negative then the inequality is trivially satisfied. If the 
right-hand side is nonnegative then in the first case —2fj,i23fijk > — A'123/^jfc and 
in the second case 2/ii23/ij7£ > /ii23Mifc- Hence the following set of inequalities is 
implied by (21) 

(1 - flt)n% > -Ail23Aijfc, (1 + fii)^^% > /^123A'jfc- 

This is exactly the first inequality in (20) which shows that it is implied by (10) 
and (12). Consequently, (18) and hence also (5) are satisfied. It follows that 
MCM^. □ 

4. A CONNECTION WITH TREE METRICS 

Now let T be a general tree with n leaves. Before stating the main theorem 
of the paper we first show how to obtain an elegant set of necessary constraints 
on Mt- In this section we assume that /l^ 7^ 1 for all v ^ V (c.f. Remark 
4.7). Since Var(y„) = ^(1 — p,"^) the correlation between F„ and is defined as 
Puv = which gives 



^ ' ^"'^ " '"'"^ Y 1 - " Y 1 - 

Lemma 4.1. For any i,j G [n] let E(ij) be the set of edges on the unique path 

joining i and j in T. Then 

(23) pij = Yl_ Pnv 

(u,v)£E(ij) 

for each probability distribution in Ai'^ such that all the correlations are well defined. 

Proof. By (7) applied to T{ij) we have /i^ = 1(1 - P-l)Y{(^u,v)eE{ij) Vu,v, where r 
is the root of the path between i and j and hence 



Now apply (22) to each ?7„_„ in the product above to show (23). □ 
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The above equation allows us to demonstrate an interesting reformulation of our 
problem in term of tree metrics (c.f. [21, Section 7]) which we explain below (see 
also Cavender [ ]). 

Definition 4.2. An arbitrary function 5 : [n] x [n] — ^ M is called a tree metric if 
there exists a tree T — {V, E) with the set of leaves given by [n] and with a positive 
real-valued weighting w : E ^ M>o such that for all i,j E [n] 

0, otherwise 



d{k,l) 



Let d : V X V ^ M. he sl map defined as 

— log{pli), for all k,l &V such that pki ^ 0, 
+0O, otherwise 

then d{k, /) > because p^i < 1 and d{k, k) = Q for all k E V since pkk = 1- li K E 
M.'^ then by (23) = nee£;(ij) Pe '^^'^ define map dij-.j^^ : [n] x [rt] — > M 

(24) - ^ - { „^'-— 

This map is a tree metric by Definition 4.2. In our case we have a point in the 
model space defining all the second order correlations and d(^T-K){h j) for i,j E [n\. 
The question is: What are the conditions for the "distances" between leaves so that 
there exists a tree T and edge lengths d{u, v) for all {u, v) E E such that (24) is 
satisfied? Or equivalently: What are the conditions on the absolute values of the 
second order correlations in order that p^j — YieeE Pe (fo^' some edge correlations) 
is satisfied? We have the following theorem. 

Theorem 4.3 (Tree-Metric Theorem, Buneman [ ]). A function S : [n] x [n] R 
is a tree metric on [n] if and only if for every four (not necessarily distinct) elements 
i,j,k,l E [n], 

S{i, j) + 6{k, I) < max {S{i, k) + 5(j, l),5{i, I) + <5(j, fc)} . 
Moreover, a tree metric defines the tree uniquely. 

The question we may now ask is - For a given assignments of edge weights on a 
tree metric, which of these correspond to a probability model on the tree defined as 
the image of (2)? From Lemma 4.1 we have seen that the tree metric itself induces 
(using Definition 4.2) some necessary conditions related to the four-point condition. 
Since S{i,j) — log{—pij) these constraints in terms of correlations translate in 

-log(PyPfc/) < -min{log(p2fcp2^),log(p|p2j}^ 
Since log is a monotone function we obtain 

. i PlkP% pIp%\ . j P-lkP^ji t4P%\ ^ , 

(25) mm ^ ^--^r } = mm { 2 2 ' 2 2 { ^ ^ 

[ PtjPkl PtjPll J { PijPkl l^zjPkl J 

for all not necessarily distinct leaves i,j,k,l E [n]. However later in Theorem 4.6 
we show that these constraints are not the only active constraints on the model 
Mt- 

Before we present this theorem it is helpful to make some simple observations 
about the relationship between correlations and probabilistic tree models. Since 
Pu V can have different signs we define a signed tree metric as a tree metric with 
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an additional sign assignment for each edge of T . There are additional natural 
constraints which assure that there exists a choice of signs for edge correlations 
such that (23) is satisfied. 

Lemma 4.4. Let T be a tree with n leaves and let s : E { — 1,1} be the map 
assigning signs to edges ofT. Suppose that we have a map a : [ri] x [n] — >■ {—1, 1}. 
Then there exists a map s as defined above such that for all i,j G [n] 

(26) a(i,j)= n s{u,v) 

if and only if for all triples i,j, k € [n] a{i,j)a{i, k)a{ j, k) = 1. 

Proof. First assume that there exists the map s as in the statement. It induces 
a map s : V xV ^ {—1,1} (we use the same notation) such that s{k,l) = 
Y[{u v)eE(ki) ^'-'^ triple i, j, k there exists a unique inner node h which is 

the intersection of all three paths between i,j,k. By the above equation the choice 
of signs for all s{u,v) (u, u) € E gives s{i,h),s{j,h) and s{k,h). Since s{i,j) = 
s{i, h)s{j, h) and the same for the two other pairs, we get that s{i,j)s{i, k)s{j, k) = 
s^{i,h)s'^{j,h)s'^{k,h) = 1 and the result follows since by construction a-{i,j) = 
s(i,j) for all i,j G [n]. 

To prove the converse we use an inductive argument with respect to number of 
hidden nodes. Note that whenever there is a path E{uv) in T such that all its inner 
nodes have degree two then a sign assignment satisfying (26) exists if and only if 
there exists a sign assignment for the same tree but with E{uv) contracted to a 
single edge (u, w). Hence we can assume that the degree of each inner node is at 
least three. First we will show that the theorem is true for trees with one inner node 
(star trees). In this case we will use induction with respect to number of leaves. 
The theorem is true for the tripod tree what can be checked directly. Assume it 
works for all star trees with k < m — 1 leaves and let T be a star tree with m 
leaves. By assumption for any three leaves i,j,k: a{i, j)a{i, k)a{j, k) = 1. If we 
consider a subtree with (1, /i) deleted then by induction assumption we can find a 
consistent choice of signs for all remaining edge correlations. A choice of a sign for 
(1, h) consistent with (26) exists if for alH > 2 a{l, i) = s(l, h)s{i, h). This is true if 
either (t(1, i)s{i, h) = 1 for all i or a{l, i)s{i, h) = —1 for all i. Assume it is not true, 
i.e. there exist two leaves i,j such that a{l,i)s{i,h) = 1 and a(l, j)s{j,h) = — 1. 
Then in particular since a{i,j) = s{i,h)s{j,h) we have a{l,i)a{l, j)a{i, j) = — 1 
which contradicts our assumption. 

If the number of the inner nodes is greater than one, pick an inner node h 
adjacent to exactly one inner node. Let /i' be the inner node adjacent to h and let 
/ be a subset of leaves which are adjacent to h. Let 1 G I and consider a subtree 
T' obtained by removing all leaves in / and the incident edges apart from 1 and 
(h, 1). By the induction, since h has degree two in the resulting subtree, we can 
find signs for all edge correlations of T' . Set s{h, h') — 1 then s{h, 1) — s{h' , 1) and 
we need only to show that we can identify s(/i, i) for all i € / \ 1. Let j, k be any 
two leaves not in / and let i G I. Using exactly the same argument we used for the 
analysis of the star tree case we can now show by contradiction that for each i e /, 
there exists an assignment of s{i, h)s{h, h') = s(i, h). □ 

The lemma implies that for all i,j,k € [n] necessarily PijPikPjk ^ or equiva- 
lently that for all i,j,k G [n] necessarily jJLijUiklJ'jk > 0. This in particular implies 
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for all (not necessarily distinct) i,j,k,l e [n]. We have now obtained the complete 
set of inequality constraints on A4t that involve only second order moments in 
their expression. However the fact that additional constraints involving higher 
order moments exist is illustrated in the following simple example. 

Example 4.5. Consider the tripod tree model in Lemma 3.2. Let K he a. point in 
K-T given by p.i = 0.7 for i ^ 1,2,3, /i^ = 0.0625 (or equivalently pij = 0.49) for 
each i < j and /ii23 — 0.05 26. This point lies in the space of tree cumulants JCt 
which can be checked by mapping back the central moments to probabilities, since 
the resulting vector [pa] lies in Ay. 

Clearly K satisfies all the tree metric constraints in (27). The equation (4.1) 
is satisfied with p^i = 0.7 for each i = 1,2,3. We now show that despite this 
K ^ A^^. For if K E A4!^ we could find fth and rih,i satisfying constraints in (5) so 
that (9) held. Using the formulas in Corollary 11 in [27] it is easy to compute that 
jlfi — 0.86 and r/h^i « 0.98. To confirm this substitute these values into (9) to check 
the equations are satisfied. However, K is not in the model since these parameters 
do not lie in fl^- Indeed, 



and hence (5) is not satisfied. 

The consequence of the fact that the parameters do not lie in fix is that this 
parametrization does not lead to a valid assignment of conditional probabilities 
to the edges of the tree. For example with numbers given above we can cal- 
culate that the induced marginal distribution for [X^ , H) would have to satisfy 
¥{Xi = 0, H = 1) ^ —0.0043 which is obviously not a consistent assignment for 
a probability model. Thus there must exist other constraints involving observed 
higher order moments that need to hold for a probability model to be valid. We 
note that for the tripod tree these were given by Lemma 3.2. 

In Section B we prove the following theorem which gives the complete set of 
constraints which have to be satisfied by tree cumulants to lie in Aix in the case 
when T is a trivalent tree. For the statement of this result we need the following 
definitions. For each edge e of T we define the edge split {A){B) as a partition 
of the set of leaves into two subsets A and B which correspond to two connected 
components of T obtained after removing e. For example in Example 2.1 removing 
(r, a) induces (12)(34). Moreover, let P e A2ii_i be the probability distribution of 
the vector (Xi, . . . , X„) then for any i,j, k e [n] P^^^ denotes the 2x2x2 table of 
the marginal distribution of {Xi,Xj^Xk). 

Theorem 4.6. Let T = {V,E) he a trivalent tree with n leaves. Let A4t ^ ^2^-1 
be the model defined as an image of the parametrization in (2) andAO}^ = /pk(-^t)- 
Suppose P is a joint probability distribution on n binary variables and K = fpK.{F')- 
Then K € Ai^ ( or equivalently P E AAt) if only if the following four conditions 
hold: 



that ^7^^^ > for all i,j,k,l E [n]. By taking the square root in (25) these 
constraints can be combined and rearranged to give the inequalities 



(27) 




(1 - ^ll)7Jh,^ ~ 0.255 > (1 + fL,){l - Ph) = 0.238 
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(CI): For each edge split [A){B) whenever we have four nonempty subsets 
(not necessarily disjoint) Ii, I2 C A, Ji, J2 Q B then we must have that 

(C2): For all 1 < i < j < k < n we have 

t^ijtJ-iktl-jk > 

and 

ipljl^lk + l^ll^]k + A*?feM|fe) < Det P'i'' < mill /i^,, 
(C3): for alll <i < j < k <n 

BetP'^'' < ((1 ± fla(i))^-a(3)a{k) T Mijfc)^ ' 

/or aZ^ t/iree permutations a of {i,j,k} such that (7(j) < a{k). 

(C4): for all I C [n] if there exist i,j^I such that fiij = then kj — 

(C5): for any i,j,k,l G [n] such that there exists e ^ E inducing a split 
{A){B) such that i,jCzA and k,l Cz B we have 

Remark 4.7. In the phylogenetic analysis it is often assumed that rj^^y > for all 
{u,v) e E and p-l ^ I for all v £ V (c.f. assumptions (M1)-(M3) in Section 8.2 
and Section 8.4 in In this case HijfJ.ikfJ-jk > for all i,j,k e [n] and the 

second constraint in (C2) is not active. Moreover, the model is globally identified 
(c.f. Appendix D in [27]). 

5. Example: The quartet tree model 

We can check that modulo the numerical error K e ICt provided in the table 
from Example 2.3 satisfies all the constraints in Theorem 4.6. To check (CI) note 
for example that 

K13K24 - K14K23 = 0.0160 • 0.0128 - 0.0160 • 0.0128 = 0, 

K123K134-K1234K13 = (-0.00384)-(-0.00256)-0.0006144-0.016 = 1.694M0~2i w 0. 

The last equation shows that due to the limited precision of numerical software 
typically the equations in (CI) will not be satisfied exactly. 

To check (C2) check for example that Detpi23 ^ 4.096-10"^, min{^?2: Mi3> l^-h} ~ 
1.6384- 10"-* and Ai?2Aii3 + M?2Mi3+Mf 3^13 « 4.7186- 10"^ For (C3) again we check 
only one of all the constraints. One has 

((1 ± fil)^i23 T M123)' = {1.3271 - 10-^ 1.9825 - 10^^} 

((1 ± /22)mi3 T /ii23)' = 2.5600 - 10-4 

((1 ± fl3)iii2 T /ii23)' = {9.4372 - 10-^ 0.0011} 

and hence 

Detpi23 ^ 4.096 . lO^^ < min { ((1 ± /i,(,))M.o>(fc) T M»jfc)'} ~ 1-3271 • 10"^ 
is satisfied. In a similar way we check (C5). 
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If we have only tree cumulants we can still identify the parameters of the model 
up to the label switching on the inner nodes using Corollary 11 in [ ]. For example 

ftl = ^ = 0.36 

Ml23 + 4a112/^13M23 

2 M?23 + 4^12^13^23 

A^23 



j^2 ^ Ml4 Ml23 + 4Atl2Atl3M23 ^ q 
M?2 M?34 + 4Ail3Ail4/^34 

Note that the numbers on the right hand side of the three equations above do not 
depend on the choices made. For example by Corollary 11 [27] to compute p,^ we 
can use any three leaves separated by r. In the formula above we used 1,2,3 but 
we could also use 1, 2, 4. But in both cases we get the same result, namely that 

til ^ ^ 0.36. 

^124 + 4^*12/^14^24 

If (CI) is not satisfied but (C2)-(C5) hold then the numbers depend on the choices 
of leaves. However the solutions for equations can be obtained from Corollary 11 
[27] and any of these choices gives feasible values for parameters in fix- 

From the point of view of the original motivation behind the paper a different 
scenario is of an interest. Imagine that we have K € /Ct such that all the equa- 
tions in (CI) are satisfied, i.e. all the phylogenetic invariants hold. If one of the 
constraints in (C2)-(C5) then K ^ A^^. This shows that the method of phyloge- 
netic invariants as it is commonly used may lead to spurious results. For example 
consider sample proportions and the corresponding tree cumulants as in the table 
below 



a 


/ 




Pa 




A/ 


Kj 


0000 





0, 


.0755 


1, 


,0000 


1.0000 


0001 


4 


0, 


.0483 


0, 


,5800 





0010 


3 


0, 


.0483 


0, 


,5800 





0011 


34 


0, 


.0579 


0, 


,3700 


0.0336 


0100 


2 


0, 


.0479 


0, 


,6200 





0101 


24 


0, 


.0399 


0, 


,3724 


0.0128 


0110 


23 


0, 


.0399 


0. 


,3724 


0.0128 


0111 


234 


0, 


.0623 


0, 


,2422 


-0.0020 


1000 


1 


0, 


.0171 


0, 


,5800 





1001 


14 


0, 


.0315 


0, 


,3716 


0.0352 


1010 


13 


0, 


.0315 


0, 


,3716 


0.0352 


1011 


134 


0, 


.0699 


0, 


,2498 


-0.0056 


1100 


12 


0, 


.0695 


0, 


,4300 


0.0704 


1101 


124 


0, 


.0903 


0, 


,2702 


-0.0084 


1110 


123 


0, 


.0903 


0, 


,2702 


-0.0084 


1111 


1234 


0. 


.1799 


0. 


,1799 


0.0014 



It can be checked that for this point all the equations in (CI) are satisfied. However 
it is not in the model. Using the formulas in Corollary 11 [- 1 ] it is simple to confirm 
that the point mapping to K satisfies 9^^ = —0.3. This cannot therefore be a 
probability and so 9 ^ Qt- 
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6. Phylogenetic invariants 

In a seminal paper Allman and Rhodes [ ! ] identified equations defining tlie gen- 
eral Markov in the case when T is a trivalent tree. In this section we relate their 
results to ours. To introduce their main theorem we need the following definition. 

Definition 6.1. Let X — (Xi,...,X„) be a vector of binary random variables 
and let P ~ (P7)7e{o,i}" be a 2 x ... x 2 table of the joint distribution of X. Let 
{A){B) form a partition of [n]. Then the flattening of P induced by the partition 
is a matrix 

P(A)(B) = [Pa/j], a e {0,l}l-4|,/3 = {0, 
where Pap — V{Xa = u.Xb — (i)- Let T = {V,E) be a tree. In particular, for 
each e & E, removing edge e from E induces a partition of the set of leaves into 
two subsets corresponding to the two connected components of the resulting forest. 
They called this flattening an edge flattening and we denote it by P^- 

Note that whenever we implicitly use some order on coordinates indexed by 
{0, l}-sequences we always mean the order induced by the lexicographic order on 
{0, l}-sequences such that • • • 00 > • • • 01 > . . . > 1 • • • 11. 

If P is the joint distribution oi X ~ {Xi, . . . , Xn) then each of its flattenings is 
just a matrix representation of the joint distribution P and contains essentially the 
same probabilistic data. However, these different representations contain important 
geometric information about the model. 

Theorem 6.2 (Allman, Rhodes [ ]). Let be a trivalent tree rooted in r and M.^ 
he the general Markov model on as defined by (2). Then the smallest algebraic 
variety, i.e. a subset of a real space defined by a finite set of polynomial equations, 
containing the general Markov model is defined by vanishing of all 3 x Z-minors of all 
the edge flattenings of together with the trivial polynomial equation 'YliaPa = 1- 

Note that the result includes the case of the tripod tree model since in this case 
each edge flattening of the joint probability table is a 2 x 4 table so there are no 
3x3 minors and hence there are no non-trivial polynomials vanishing on the model. 

In an analogous way to the edge flattenings of tables representing probability 
distributions we can define edge flattenings of (K/)/c[ri] where K0 = 1 and = 
for all i S [n] (c.f. Appendix A). Let e be an edge of T inducing a split {A){B) e IIt 
such that 1^1 — r, \B\ ~ n — r. Then N^, is a 2"" x 2"^'" matrix such that for any 
two subsets I C A, J C B the element of corresponding to the I-th row and 
the J-th column is kjj. Denote by N,, its submatrix given by removing the column 
and the row corresponding to empty subsets of A and B. Here the labeling for the 
rows and columns is induced by the ordering of the rows and columns for (c.f. 
Definition 6.1), i.e. all the subsets of A and B are coded as {0, l}-vectors and we 
introduce the lexicographic order on the vectors with the vector of ones being the 
last one. 

The following result allows us to rephrase the equations in Theorem 6.2 in terms 
of our new coordinates. 

Proposition 6.3. Let T = (V, E) be a tree and let P be a probability distribution 
of a vector X = {Xi, . . . , X„) of binary variables represented by the leaves of T . If 
e & E is an edge of T inducing a split {Ai){A2) then rank(Pe) — 2 if and only if 
rank(iVe) = 1. 
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Proof. Let = [pap] be the matrix induced by a split {Ai){A2). We will show 
that rank(Pg) — rank(Z?e) where = [djj] is a block diagonal matrix with 1 as 
the first 1x1 block (i.e. c/gB — 1, dgj = 0, c?/0 = for all / C J A2) and 
matrix Ne as the second block. It will then follows that rank (Pe) = 2 if and only 
if rank(iVe) = 1- 

First note that the fiattening matrix Pe can be transformed to the flattening of 
the non-central moments just by adding rows and columns according to (29) and 
then to the flattening of the central moments Me — [fJ-ij] such that I Ai, J C A2 
using (30). It therefore suffices to show that rank(Me) — rank(£'e). 

Let I C Ai, J C A2. Then for each tt g ^t(ij) there is at most one block 
containing elements from both / and J. For otherwise removing e would increase 
the number of blocks in tt by more than one which is not possible. Denote this 
block by {I' J') where /' I, J' C J. Note that by construction we have either 
both /', J' are empty sets if tt > {Ai){A2) in 11^(7 j) or both /', J' 7^ otherwise. 
We can rewrite (32) splitting the blocks 

(28) /i/j = ''I' J' n «s n ""b] ■ 

7r6nT(j,7) \ IDBe-rr JDBEn J 

We have d/'j' = ki'J' and it can be further rewritten as 

M/J = X! X! '^ii'drj'Vj'j 

I'CI J'CJ 

where uij, = E^enT(/V') Hse^ «b and v^j = E^gHtCAJ') Hse^ '^s- Setting 
ujji = for /' ^ /, vjij = for J' ^ J we can write these coefficients in terms of a 
lower triangular matrix U and an upper triangular matrix V. Since by construction 
uii = 1 for all / C Ai and vjj = 1 for all J C ^2 we have detU = detV — 1. 
Consequently, has the same rank as Dg. □ 

The proposition shows that the vanishing of all 3 x 3 minors of all the edge 
flattenings of P and the trivial invariant ^ = 1 are together equivalent to the 
vanishing all 2 x 2 minors of all edge flattenings of k = (K/)/g[„]^,. An immediate 
corollary follows which gives the equations in (CI) in Theorem (4.G). 

Corollary 6.4. Let T — (V, E) be a trivalent tree. Then the smallest algebraic 
variety containing A4!^ is defined by the following set of equations. For each split 
{A){B) induced by an edge consider any four (not necessarily disjoint) nonempty 
sets /i,/2 C A, Ji, J2 Q B and the induced equation ni^j^ni^j^ — = 0. 

In [ ] Eriksson noted that some of invariants usually prove to be better in 
discriminating between different tree topologies than the others. His simulations 
showed that the invariants related to the four-point condition were especially pow- 
erful. The binary case we consider in this paper can give some partial understanding 
of why this might be so. Here, the invariants related to the four-point condition 
are only those involving second order covariances (c.f. Section 4). Moreover, the 
estimates of the higher-order moments (or cumulants) are sensitive to outliers and 
their variance generally grows with the order of the moment. Let /t be a sample 
estimator of the central moments and let / be one of the polynomials in Theorem 
6.4 but expressed in terms of the central moments. Then using the delta method 
we have 

Var(/(/i)) ~ V/(A.)*Var(A)V/(M). 
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Consequently, in this loose sense at least, the higher the order of the central mo- 
ments and hence tree cumulants the higher the variability of we might expect the 
invariant to exhibit (see [IN, Section 4.5]). 

7. Discussion 

The new coordinate system proposed in [ ] provides a better insight into the 
geometry of phylogenetic tree models with binary observations. The elegant form 
of the parameterization is useful and has already enabled us to obtain the full 
geometric description of the model class. One of the interesting implications of 
this result for phylogenetic tree models is that we could consider different simpler 
model classes containing the original one in such a way that the whole evolutionary 
interpretation in terms of the tree topologies remains valid. If we are interested only 
in the tree we could consider the model defined only by a subsets of constraints 
in Theorem 4.6 involving only covariances. The price for this reduction is that 
the conditional independencies induce by the original model do not hold anymore 
which in turn affects the interpretation of the model. We note that this approach 
is in a similar spirit to that employed to motivate the MAG model class introduced 
in [23]. 

This work has encouraged us to use this reparametrization of this model class to 
estimate models within Bayesian framework. When the sample proportions lie in 
the model class then we have already noted that the MLEs are given by formulas 
in Corollary 11 in [27]. However these sample proportions rarely lie in the model 
exactly. In a later paper we prove various formal methods for incorporating the 
semi-algebraic geometry in a model to improve the prior specification of the tree 
model and hence enhance the estimation of the model parameters. 
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Appendix A. Change of coordinates 

First we change our coordinates from the raw probabilities p = [pa\ to the non-central 
moments A = [A^,] for a — (ai, . . . , a„) £ {0, 1}", where Ac = E(nr=i ^T^)- This is a 
linear map fp\ : — >■ R'^ with the determinant equal to one, where the components 

\a oi \ = fpxip) are defined by 

(29) A,, = ^ p^j for any a G {0,1}", 

a</3<l 

where 1 denotes here the vector of ones and the sum is over all binary vectors /3 such that 
a < /3 < 1 in the sense that Ui < Pi < 1 for all i = 1, . . . , n. In particular Ao = 1 for all 
probability distributions. So the image /pa(A2"_i) is contained in the hyperplane defined 
by Ao = 1. 

The linearity of the expectation implies that the central moments can be expressed in 
terms of non-central moments. Define Ha = E(f|"^-^ (7"'), where Ui = Xi — EX^. Then 

n 

(30) t^c= (-l)""Ac-^n-^^.' for a G {0,1}", 

0<;3<Q i=l 
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where = Using these equations we can transform variables from the non-central 

moments A — to another set of variables given by all the means Ae^ , • . • , Ae^ , where 
61 , . . . , Gn are standard basis vectors in R", and central moments [^a] for a £ {0, 1}. 
The polynomial change of coordinates /a^ : K'^ — >■ K" x is an identity on the first 
n coordinates corresponding to the means Aei , • • • , Ae„ and is defined on the remaining 
coordinates using the equations (30). Denote Cn ~ (/a/j o /pa)(A2"-i) which is contained 
in a subspace of R" x R^ given by 

/^o = 1 and fiei = ■ • ■ = = 0. 

To simplify notation henceforth we will index moments not with {0, 1}" but with the set 
of subsets of [n]. Here the set A C [n] is identified with a £ {0, 1}" such that Ui — 1 for 
all i £ A and it is zero elsewhere. In particular for each i £ [n] we write Ai for Ae^ . The 
coordinates of Cn are given by Ai, . . . , A„ together with fii for all / C [n] such that \I\ > 2. 
Note that the Jacobian of /a,j o fpx : A2"-i — > Cn is constant and equal to one. 

The final change of coordinates requires some combinatorics. Let T = (V^, E) be a tree 
with n leaves. A split induced by e £ E is a. partition of [n] into two non-empty sets 
induced by removing e from E and restricting [n] to the connected components of the 
resulting graph. By a mulUsplit we mean any partition (Bi) • • • (Bfe) of the set of leaves 
induced by removing a subset of the set of edges of T. Each Bi is called a block of the 
partition. 

By IIt we denote the partially ordered set (poset) of all multisplits of the set of leaves 
induced by edges of T. The poset IIt has a unique maximal element induced by removing 
all edges in E and the minimal one with no edges removed which is equal to a single block 
[n]. The maximal element of a lattice is denoted by 1 and the minimal one is denoted by 
0. 

For any poset 11 a Mobius function mn : 11 x 11 — >■ R is defined in such a way that 
mnix,x) = 1 for every 2; £ 11, mn{x,y) — — '}2x<z<y''^'n{^j^) for a; < y in 11 and it is 
zero otherwise (c.f. [24, Section 3.7]). Let W GV, then by T{W) we denote the minimal 
subtree of T spanned by W. Then IIt{w) is the poset of all multisplits of the set of 
leaves of T{W) induced by edges of T{W). The Mobius function on this poset is denoted 
by mnrpi^-inr-i '■= mw- For Ht we write mn^ ~ m. We write Ow and Iw to denote the 
minimal and the maximal element of Tlx(w) respectively. 

Consider a map /^^ : R" x R^ — >■ R" x R^ where the coordinates in the domain are 
denoted by Ai , . . . , A„ and fij for / C [n] and the coordinates in the image are denoted by 
p,i, . . . ,p,n and Ki for 7 C [n]. The map is defined hy p.i = 1 — 2Xi for i = 1, . . . , n and 

(31) Ki= j2 "1/(0/, tt) n 

/is for all I C [n], 

where by convention K0 = /i0. Denote ICt = ffiKiCn)- Note that for any I C [n] such that 
|7| < 3 we have ki = /ii. In particular ICt is contained in the subspace of R" x R^ given 
by 

K,0 — 1, Kl — . . . — Kn = 

Therefore the coordinate system on ICt is given by fli, . . . ,fin and kj for |/| > 2. The 
map ffj_fi : Cn — > ICt is a one-to-one polynomial map with a polynomial inverse J^^. The 
exact form of the inverse map is given by the Mobius inversion formula (c.f. [27]) 

(32) = J2 H'^B for all / C [n], jJj > 2, 

and Ai = |(1 — fJ-t) for i — 1, . . . ,n. The Jacobian of /^^ is constant and equal to 2". 
Note that all fpx, f\^, after restriction to A2"-i, /pa(A2"-i) and C„ respectively, 
are regular polynomial maps with regular inverses (c.f. [-1], Appendix A). This therefore 
implies that there is a regular one-to-one polynomial map between A2n-i and ICt- 
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Appendix B. The proof of the main theorem 

Let K G K,T have coordinates given by fii for i £ [n] and k/ for I C [rz] such that 
|/| > 2. Let for J C [n] denote the projection on coordinates given by for i £ J 
and K/ for I ^ J such that |/| > 2. It follows directly from the definition of Mt that 
K G Mt if and only if K' G Al^fj) for all / C [n]. 

Denote by the subset of K.t defined by constraints in (C1)-(C5). We need to show 
that M = Mt- Since the rooting is not relevant we choose an arbitrary inner node as 
the root node. We first show that Mt ^ M- Let K £ Mj- and hence K = ^t{^) for 
some iu G Qt- The equations in (CI) hold since by construction (c.f. Theorem 6.2) the 
variety in ICt defined by the equations contains the image of tpT and hence also K. To 
show that K satisfies (C2) and (C3) consider the projection K^^'' for each fc G [n]. By 
Lemma 1 in [27] MT(ijk) equal to the tripod tree model. Since K'^-''' G MT(ijk) then by 
Lemma 3.2 (C2) and (C3) must hold. To show that K satisfies (C4) let i,j £ I be such 
that Hij = 0. In this case from (7) to is such that either rju,v ~ for some {u,v) G E{ij) 
or fir(ij) ~ 1- III the first case since E{ij) C E{I) then k/ = by (7). In the second case 
if r{ij) — r(I) then again k/ = by (7). If r{ij) 7^ r(/) then the edge {v,r{ij)) pointing 
to r{ij) also lies in E{I). By (5) either ri^^r(ij) = in which case we are done or /i^ = 1 
in which case again there exists an edge in E{I) pointing into v. Since the tree is finite 
eventually either = 1 or rju^v ~ for some (u, v) G E{I). This shows that necessarily 
K/ = and hence K satisfies (C4). 

To show that K satisfies (C5) let i,j,k,l G [n] be the four leaves mentioned in the 
condition. Let u and v be two inner nodes such that u separates i from j, v separates 
k from / and {u,v} separates {i,j} from {k,l}. In other words u, v axe the only inner 
nodes of degree three in T{ijkl). By Lemma 2 in [27] T{ijkl) gives the same model as the 
quartet tree with four leaves k, I and two inner nodes u, v. Moreover, by Remark 2.5, 
■Mriijki) does not depend on the rooting so we can assume that the tree is rooted in u. 
Since K^-''^^ G MT{ijki) then for some parameter choices 

= ~ P'l)Vy.,trin,vVv,k, f^ji = - P'l)VujVy.,vVv,l 

Substitute these equations into (C5). There are then two cases to consider: > 0, 

jJ-uv < 0. Laborious but elementary algebra shows that the condition in (C5) is equivalent 
to (.5) applied to {1 — fll^)riu,v and hence (C5) holds by definition. Consequently Mt ^ M- 
We next show that M Q Mt- Let K € M. We construct a point ljq G + such 
that uuf) G Qt and '^^(ajo) ~ K, i-e. i^o is such that for all / C [n] such that |/| > 2 kj 
can be written in terms of the parameters in cjq as in (7). 

Case 1: Begin by assuming that K is such that 7^ for all i,j G [n]. We now set 
squares of values of all the parameters in terms of the observed moments as in Corollary 
11 in [27]. We will show that the equations in (7) must hold for their modulus values. 
Next we will need to ensure there is at least one assignment of signs for a set of parameters 
such that all (7) hold exactly. Finally we show that the parameter vector tog defined in 
this way lies in JIt. 

For each inner node h of T let i,j,k G [n] be separated by h in T. By (C2) we have 
that fiijt-iikHjk > and hence also that DetP'-'* > 0. Now set 

(33) mif- 



D^tpijk ■ 

We show that (CI), which K satisfies by assumption, implies that the value of (fil)'^ does 
not depend on the choice of i,j, k. It suffices to show that if k is replaced by another leaf 

k' such that i, j,k' are separated by /i in T then ^ 'j,;,,, — — 'f . , , . Since h has degree 
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three in T then there exists an edge e £ E inducing a spht {A){B) such that i,j £ A and 
fc,fc' G B. From (CI) it follows that 

(34) /J-ikfJ-jk' = /J-ik'fJ-jk, fJ-ijkfJ-ik' ~ fJ-ij k' fJ-ik , /J-ijkfJ-jk' ~ fJ-ijk'lJ-jk 

and consequently 

(35) DetP'^'^ ^lijfiik'fijk' = DetP*^* iJ,ijfJ.iklJ.jk 
which implies that 

Mijfc _ f^ljkt^ijf^ik' fJ'jk' _ ^ij k' t^ij f^ik l^-jk _ f^ijk' 



DetP^J* DetP»J''>ij/i,fc//ijfe/ DetP'JfcV^jMifcMjfc DetP'J*' 

as required. 

For terminal edges (v, i) of T such that i £ [n] let j, k G [n] be any two leaves of T such 
that u separates i, j, fc. Set 

(36) {nv,^) = 2 • 

As in the previous case it is straightforward to check that given (CI) this value does 
not depend on the choice of j, k. Without loss assume that instead of k we have k' and 
V separates i,j,k' in T. Since there exists an edge split such that i,j and k,k' are in 
different blocks we have (34) and (34) and consequently 

DetP'-''^ ^iifeDetP'^*^ DetP'^*' 



Mjfe fJ'ik'f^jk'fJ-jk /i^j./ 

For inner edges {u,v) £ E let i,j,k,l £ [n] be any four leaves such that u separates i 
from j, V separates k from I and {u,v} separates {i,j} from {k,l}. Set 

(37) (VJ =^D^^ 

which is well-defined since /^fj and DetP**' are strictly positive. We now show that this 
value does not depend on the choice of k, I. By symmetry it suffices to show that we 
obtain the same value if instead of I we took another leaf such that u, v are the only 
degree three nodes in T{ijkl'). Since v has degree three then there must exist an inner 
edge separating i,j,k from 1,1'. From (CI) it follows that 

flit' fiki''DetP''''' = fiiifiki'DetP^''' , fiii^ikv = Mii'Mfc! 

and hence 

fil DetP'^'= _ fjia'm' t^l DetP'^'= _ DetP'^* 



fifjDetP^''^ ^u' fJ-ki' 'Det P'f''^ /if^ Det P^*='' 

as required. 

We now show that that the modulus of equations (7) hold. First consider the case 
/ = Label the inner nodes of E{ij) by Vi, . . ■ ,Vk beginning from the node adjacent 

to i. For each s = 1, . . . , let is denote a leaf such that Vs separates i,j, is in T. We 
assume that the root r{ij) of this path is in vi. The analysis is the same for any other 
rooting by Remark 2.5. We now proceed to check that 
/ 1 \ 2 

(38) ^i-j = 

(u,v)eE(ij) 

2 / fe 



(^(i-(m^(..))')) n 

Q(l - (Mr(ij))^)^ iVvun)^ \ tliVv,_i,v 



I n2 

(?;«fc,i,) ■ 



Vs=2 

Since v\ separates i,j,ii by construction, from (33) we therefore have 



1/1 _ (-0 n2n _ l^ijf^iiil^jii 
4^ VM.J ) Yy^^p^J^,)■ 
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Now substitute this equation and all the set values in (36), (37) into the right hand side 
of (38). Use the fact that Vk separates i,j,ik in T and is-i,is are the only degree three 
nodes in T{iis-ijis)- Since {vi,i) and {vk,j) are the only terminal edges we obtain 

.oq^ / M..M»nWn V DetP'-"^ /Aj4^petP^^\ DetP'^'" 

^^^^ {DetiP^^n)) ■ ■\VV^l^_^^^^p.,., j- ^2^^ 

It can now be checked that all the expressions with hyperdeterminants cancel out and the 
formula reduces to /xfj as required. 

Now we need to show that for every / = fe} 

(40) /^ijfe = Q(l - Mr(sjfc))^) j (m»)^ n ('?".'')^' 

where by w we denote the node separating i, j and k. Assume that T{ijk) is rooted 
somewhere on the path between i and j. Using (38) the right hand side of (40) can be 
rewritten as 

(41) m?.(m»)' n (^°.-)'- 

Number the degree three nodes in E{wk) by vi, . . . ,vi and let is denote a leaf such that 
the inner nodes of T{ijkis) of degree three are exactly Vs-i and Vs, where vo = w. By an 
exactly analogous argument as in the case above we obtain 

(A')\ rr .o^2_^DetP^ /^A mLi». DetP''-^'°-^'' \ DetP"-!"" 

(u,v)eE(n,k) ^^^3 ^ \s=2 ^^^s^2^s^l ^^^^ J f^H-in 

where io — i. It can be easily checked that all the hyperdeterminants apart from the term 
DetP'-'* cancel out. Moreover all the covariances apart from fi~j^ cancel our as well and 

hence (42) is equal to ^^^j^—- Now using the definition of (pZ)'^ in (33) it can be easily 

checked that (41) is equal to /^f^j, as required. 

So far we have confirmed only that the squares of parameters in ujo satisfy required 
equations at least for the tree cumulants up to the third order. Next, we show that 
there exists a consistent choice of signs for these parameters such that the equations are 
satisfied exactly. Let = sgn(/iij). Since by assumption ^ij ^ for all i,j £ [n] 

then the conditions in (C2) imply that a{i, j)cr{i, k)a{j, k) — 1 for all triples i,j,k £ [n]. 
Hence by Lemma 4.4 there exists a choice s{u,v) € { — 1,+1} for all {u,v) G E such 
that cr{i,j) = Yl(u v)eB(ij) s(u,u) for all i,j £ [n]. For any two nodes k,l £ V we define 
s{k,l) = Yl(u v)£E(ki) ^i''^''")- ^ choice of signs for the parameters can be obtained as 
follows. For each edge {u,v) £ E we set sgn(77jj „) = s{u,v) and for each inner node v we 
set sgn(/iS) = sgn(^ijfc)s(i;, i)s{v, j)s{v, k) where i, j, k are any three leaves of T separated 
by V. 

Assume now that the choice of the signs of the parameters, induced by s{u,v) for 
[u, v) £ E, has been made. This choice of signs gives 

(43) fil^s{v,i)3{v,j)s(v,k)- 



VDetP^J* 



, VDetP'Jfc 
(44) r]l^, = s(v,i) 



DetP'J'= 
DetP*^ 



(45) 77°,„ = s(ii, -y) 

Note that in particular with this choice of signs sgn(r;S „) — s{u,v) for all {u,v) £ E and 
sgn{jj,°) = sgn(/iijfe) n(u,„)g_B(ijfc) s{u,v). For fi^j and ^lijk we have shown that modulus 
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of (7) hold. It suffices to check that the signs match. But this follows directly from the 
construction. Indeed, proving (38) and we have shown that 

= ^(1 - (Pr(ij))^) n 

(u,v)eE{ij) 

Now multiply both sides by s{i,j) = ll(u.v)eE(ij) s{u,v) to get 

(46) fiij = s{i,j)\^J.^J\ = ^(1 - (^°(ij))^) Y[ s{u,v)\ri".^\ = 

(47) = ^(i-(m°(,,))^) n 

{u,v)eE{ij) 

Similarly from (40) we have that 

(u,v)£E(ijk) 

Multiply both sides by sgn{fiijk) and use the fact that (n(ii v)GE{ijk) •^('^i'^))^ = 1 to get 
fJ-ijk = |(1 - (Mr(»jfc))^) I sgn(^ijfe) Y[ n s{u,v)\t]IJ = 

\ (u,v)eE(ijk) J (u,v)eE(ijk) 

(u,v)SE(ijk) 

as desired. 

We now show (7) for \I\ > 4 by induction. Let (u, v) £ E he any edge splitting / into 
two subsets /i and I2 such that I/2I > 2 and u is the node closer to Ji. Let i G Ji and 
j e h then by (CI) 

By induction we can assume that At/jj, k^/j and Kij have form as in (7). Moreover, 
11(11, i,)eE{i/2) ^'"■^ n(u,u)eE(/ii) _ T-r 

ii(u,v)eE(ij)^",^ («,«)SB(i) 

n ^^'"= n 

heN{il2) heN{vl2) 

n M^'^-^- n 

heN{Iij) heN{Iiu) 

Using this we can write 



(48) K/1/2 - ^ Q _ ^2 N 11 Atft 11 J?!..!.- 

V f^r(ij)) heN(I) (u,v)eE(I) 

The root of T{I) is either in T{Iiu) or in T{vl2)- In the first case r{Iij) — r{I) and 
r{il2) = T{ij)- In the second case r[I\j) — r{ij) and r{il2) = ''(/). Hence in both cases 

(1 - M?(ii-2))(1 ~ _ , _2 N 

and (48) has the required form given by (32). It follows that K = tpT{i^o)- 

It now remains to show that the parameters defined in (43), (44) and (45) define a 
parameter vector cjo which lies in Q,t- Since by (C2) fif^i. < DetP*-'*^ for all i,j,k £ [n] it 
follows that for all inner nodes h we have p!^ £ 1] ^ required. For a terminal edge 
{v, i) consider the marginal model induced by T{ijk), where j, k are any two leaves such 
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that V separates i,j,k in T. From Lemma 3.2 constraints (C2) and (C3) imply that 
is a vahd parameter. To show that (45) satisfies (5) we write 



We now substitute this together with the expressions for and given by (43) into (5). 
First assume s{u,v) = 1 then s{u,k) = s{v,k), s{v,i) = s{u,i) and (5) becomes 



^iJ-ijfJ'iklJ'jk 

VDetP^^^DetP^ 



both sides by j\/DetP*J'=DetP*'=' to get 



The left hand side is equal to ^m.^m^^m., , ^> ^ ^ ^^'.-f^' ,^^s(j, I). Now multiply 



4M?feMj; < (y^M|^DetP^±s(7i,/)MjiM»jfc) (^DetP^ =F s(«, OM^fci) ■ 

However, s{u,l) = s{v,l) hence this is satisfied by (C5). It i easily calculated that the 
case s{u,v) — —1 leads to the same constraint. This finishes the proof in Case 1 when K 
is such that fiij ^ for all i,j £ [n]. 

Case 2: For the general case let if £ A1 be a tree cumulant and let E = [l-tij] G R"'^" 
be the matrix of all covariances between the leaves. We say that that an edge e G i5 is 
isolated relative to K if fj,ij = for all i,j G [n] such that e G E{ij). By E (- E we denote 
the set of all edges of T which are isolated relative to K. By T = {V, E\ E) we denote 
the forest obtained from T by removing edges in E and we call it the Jf-forest. We define 
relations on E and E\E. For two edges e, e' with either {e, e'} C -E or {e, e'} C -E \ -E 
write e ~ e' if either e = e' or e and e' are adjacent and all the edges that are incident 
with both e and e' are isolated relative to K. Let us now take the transitive closure of 
~ restricted to pairs of edges in E to form an equivalence relation on E. This transitive 
closure is constructed as follows. Consider a graph with nodes representing elements of E 
and put an edge between e, e' whenever e ~ e'. Then the equivalence classes correspond to 
connected components of this graph. Similarily, take the transitive closure of ~ restricted 
to the pairs of edges in _B \ £ to form an equivalence relation in E\E. We will let [E] 
and [E \ E] denote the set of equivalence classes of E and E\E respectively (for details 
see Section 5 in [27]). 

Again we show that there exists a;o G VIt such that 1/^(1^0) ~ K- Set r;° „ = for all 
(u, v) € E and p,'^ = for all inner nodes of T with degree zero in T. It then follows that 
(1 — fJ.u)Vu,v = satisfies (5) for all {u,v) G E and /iJ! G [—1, 1] for all w G V and hence 
these parameters satisfy constraints defining SIt- If / C [n] is such that E{I) n -B 7^ 
then Ki — hy (C4). Hence in this case we can assert that 

^i=l{i-(Anf) n (mS)"^'^"" n 

simply because both sides of this equation are zero. By Lemma 9 (iv) in [27] every 
connected component of T is a subtree which is either an inner node or a tree with the 
set of leaves contained in [n]. Denote the connected subtrees which are not inner nodes 
by Ti, . . . , Tfc and their sets of leaves by [n;] for I — 1, . . . , k. For every I — 1, . . . , k and all 
hj G [ni] we have that /i^j 7^ 0. Hence for each Ti applying Case 1 we have K^"'^ G Mti ■ 
If J C [n] is such that E{I) nE = then I C [m] for some l = l,...,k. Since K^"'^ G Mt^ 
then there exists a choice of parameters such that ki can be written as (7). Consequently 
K £ A4t and we are finished. □ 
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